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Maximum Likelihood Signature Estimation 
A bstract 

In this outline, we discuss maximum-likelihood estimates, based on an 
unlabeled sample of observations, of unknown parameters in a mixture of normal 
distributions. Several "successive approximation" procedures for obtaining 
such maximum-likelihood estimates are described. These procedures, which are 
theoretically justified by the local contractibility of certain maps, are designed 
to take advantage of good initial estimates of the unknown parameters. It is 
anticipated that they can be profitably applied to the signature extension 
problem , in which good initial estimates of the unknown parameters are obtained 
from segments which are geographically near the segments from which the unlabeled 
samples are taken. Additional problems to which these methods are applicable 
Include: estimation of proportions and adaptive classification (estimation of 
mean signatures and covariances) 

1. Introduction 

Let {x^}^ N £ ft** be an unlabeled sample of observations from a 
mixture of m populations ,• where each population is normally distributed, and 
let some (possible empty) subset of the signature parameters m 

be known. (Here, is the a. priori probability that a sample observation comes 

from the i** 1 population; and are, respectively, the mean vector and 

covariance matrix for observation from the i C ^ population in ^ n .) A maximum- 
likelihood estimate of the remaining parameters is a choice of those parameters 


which maximizes the log-likelihood function 
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In this expression, p denotes the mixture density function , i.e., for 
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Techniques for obtaining maximum-likelihood estimates of this type have 
been studied by many authors and are considered by a number of them to be 
superior in general to other methods of estimating the parameters of a mixture 
of normal distributions. (See, for example, [2] and [6].) Clearly, L is 
a differentiable function of the signature parameters to be estimated, and there 
are many approaches to obtaining a maximum of such a function. We dljcuss 
several such approaches) each involving "successive approximation" iterative 
procedures suggested by the particular form of L. 

The iterative procedures to be described in the following are based upon 
manipulating the gradient of L, with respect to the unknown parameters, and 
incorporating the resulting expressions in fixed-point equations for the 
unknown parameters. Some of these iterative schemes have been studied by other 
authors; others are new. In xecent preliminary results of Peter? and Walker, 
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coovergence of Che Iterates has been established for Initial estimates within a 
sufficiently small neighborhood of the maximum likelihood estimate. This Is 
accomplished by establishing that the appropriate maps are locally contractive 
at their fixed points. Consequently, these procedures are well-suited for 
application to the signature extension problem, whenever "reasonable" Initial 
signature estimates (i.e., those satisfying the contractabllity condition) can 
be obtained from segments which are geographically near the segments from which 
the unlabeled samples are taken. We discuss the application of these schemes 
to the signature extension problem and other problems. 


2. The likelihood equations . 

The procedures require (for e given unlebeled sample) the calculation of 
the partial derivatives of the log-llkellhood function with respect to the signature 
parameters. Equating the resulting partial derivatives to zero (the necessary 
extremum condition), a straight- forward calculation yields the likelihood equations ? 


(l.e) 

(l.b) 

(l.c) 

In the following, we will assume that a solution of any subset of the 
likelihood equations is a maximum likelihood estimate of the corresponding 
signature parameters. For example. If a set of mean vectors and covariance 
matrices Is given, then a maximum-likelihood estimate of the a priori 
probabilities is a solution B of the equations (l.a). 

3. The natural Iterative procedure. 



The likelihood equations, as given, suggest the following iterative 



s 


procedure: Beginning with some Initial estimate, obtain successive approximations 

of the unknown parameters by Inserting the preceding approximations in the 
expressions on the right-hand sides of the appropriate equations (l.a), (l.b), 
(l.c). Such a scheme for obtaining max imum- 1 ike 1 ihood estimates has been 
investigated by several authors. 

Empirical studies in [2], (3), and [4] suggest that this scheme is 
convergent, even if all the parameters are unknown, and that convergence appears 
to be particularly fast when the populations are "widely separated". 

Unfortunately, the likelihood equations may have several solutions, 
and the Iterates may converge to a solution which is not a maximum likelihood 
estimate if care is not taken in the choice of an initial estimate. No 
theoretical evidence of convergence is given in [2], [3], or [4]. 

Coberly and Peters [1] have proved that, if the unknown parameters are 
the a priori probabilities, then the scheme is locally convergent , l.e., 
convergent for an initial estimate which is sufficiently near a max imum- likelihood 
estimate. They also report on numerical studies in which the computational 
feasibility of this procedure is demonstrated. Recent results of Walker 
state that the scheme is locally convergent when the unknown parameters are 
the means , whenever there are only twe populations (i. e. , m - 2) 
or whenever the populations are "widely separated". The local 
convergence results are all achieved by showing that the expressions on the 
right-hand sides of the appropriate likelihood equations are locally con- 
tractive functions of the unknown parameters (in some vector norm) near a 


maximum- likelihood estimate. 
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4. A modified Iterative procure . 

We will now describe s modification of the Iterative procedure Juat given 
for which more extensive local convergence results have been obtained. Becausr 
these results are not yet sufficiently complete to allow the covariance matrices 
9 to be unknown parameters, ve will give the fixed-point equal Ions for the £ 

priori probabilities and the mi in vectors only. These enustlons are 


a 
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where the scalar parameter c Is to be determined . Clearly, these equations 

are satisfied If and only If the equations (l.a) and (l.b) are satisfied. 

If any or all the a priori probabilities and the mean vectors are 

unknown, the the equations (2. a) and ( 2 . b ) suggest an iterative scheme 

analogous to that associated with equations (l.a), (l.b), and (l.c). Recent 

preliminary results of Peters and Walker state that this scheme is locally 

2 

convergent when € i — — . If only the means are unknown, then the scheme is 

mn+1 % 

2 

locally convergent for e & — . As before, these local convergence results are 

01 

obtained by showing that the appropriate maps In the equations (2. a) and 
(2.b) satisfy local contractlblllty conditions near a maximum- likelihood 

estimate. 

This Iterative scheme appears to be new, and we feel that it holds con- 
slderaol'* promise. It is as easy to Implement In prsctlce as the scheme des- 
cribed In the preceding section, and It should converge just as rapidly. 
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Unless m rad n ere smell, it will be s more precticel metnod of obtaining 
maximum likelihood estimates than Newton's method or the method of scoring, as 
described by Kale [5]. Actually, Newton's method and the method of scoring 
should require fewer iterations for convergence than this scheme. However, 
the computational effort in these methods may be considerable because the Inverse 
of an (n+l)m x (a>!)m matrix must be calculated at each iteration. The 
modified versions of Newton's method and the method of scoring given in [5J 
will require the same number of iteiatlons as this method. However, there 
is additional computation involved at each iteration for these modified methods. 

5. Applications to signature extension and other problems. 

The iterative procedures described in the preceding sections appear well- 
suited for application to the signature extension problem . This problem has 
been characterized as that of developing a computationally useful method of 
"extending signatures" from one sample segment to geographically nearby sample 
segments. In this context, "extending signatures" means modifying a given 
set of signature estimate in order to obtain a set which is more useful for 
the purposes at hand, e.g., classification or estimation of proportions. 

Although incomplete, the results given here are encouraging. The numerical 
studies reported in (1], [2], [3J, and [4] demonstrate the computational 
feasibility of the procedure described in Section 3. The procedure discussed 
in Section 4 appears to be no more difficult to Implement. All results, both 
empiric*... and theoretical, obtained so far lead one to believe that the 
iterative schemes of Sections 3 and 4 will converge to a maximum- likelihood 
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estimate of the signatures whenever "reasonably goc Initial signature 
estimates are provided. (Initial signature estimates which lie within s "radius 
of local eontractlblllty" of a maximum-likelihood estimate, as suggested by 
the results of Sections 3 and 4, can certainly be considered "reasonably good'.') 

It la our hope that, in practice. Initial signature estimates obtained from 
"geographically nearly sample segments" will prove to be "reasonably good" in 
this sense. 

In addition, we anticipate that these Iterative procedures will be pro- 
fitably applied to other problems of remote sensing. The Iterative scheme for 
the equations (l.a) Is shown li [1] to be a viable approach to the problem 
of estimation of proportions . Even more reliable proportion estimates should 
result when the remaining equations (l.b) and (l.c) are also utilised to 
provide maximum-like_JLhood estimates of all the signature parameters. (The 
equations (2. a), (2.b), and their analogues for the covariance matrices can, 
of course, be used to the *<ame end.) Also, an effective solution to the signature 
extension problem would appear to be applicable to the adaptive classification 
problem . Indeed, this problem, that of continually updating population statistics 
on the basis of incoming samples, is clearly seen to be closely related to the 
signature extension problem from both a mathematical and a statistical point of 
view. 

6. Future areas of work . 

Despite the encouraging results obtained so far concerning the iterative 
procedures described in the preceding sections, considerable research remains 
to be done. Generally speaking, the major theoretical problem is to determine 
the precise circumstances under which these Iterative procedures can be expected 


to converge to maximum- likelihood estimates. More specifically, the local 
convergence results given here must be extended, hopefully to allow e**y subset 
(including all) of the signature parameters to be unknown. In addition, it 
is necessary to determine quantitatively how near an initial signature estimate 
must lie to a maximum- likelihood estimate in order for the lteretes ti converge 
to a maximum likelihood estimate. 

In the absence of more extensive theoretical results, it will be necessary 
to run many numerical trials, varying the unknown parameter sets, the true 
population signatures, the starting values and other factors, in order to 
determine empirically when the Iterates can be expected to converge to a 
maximum- likelihood estimate. Whether or not further theoretical results are 
obtained, numerical procedures need to be studied with an eye toward optimizing 
computational efficiency. For example, allowing the covariance matrices to 
vary arbitrarily in these procedures will require the calculation of their 
determinants and Inverses at each iteration. Hence, one might study Iterative 
schemes in which the covariance matrices are assumed to vary in a particularly 
simple way, e.g., by multiplication on the left and right b> diagonal matrices. 
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